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Abstract 

We propose a versatile and computationally efficient estimating equation method 
for a class of hierarchical multiplicative generalized linear mixed models with 
additive dispersion components, based on explicit modelling of the covariance 
structure. The class combines longitudinal and random effects models and 
retains a marginal as well as a conditional interpretation. The estimation pro- 
cedure combines that of generalized estimating equations for the regression 
with residual maximum likelihood estimation for the association parameters. 
This avoids the multidimensional integral of the conventional generalized linear 
mixed models likelihood and allows an extension of the robust empirical sand- 
wich estimator for use with both association and regression parameters. The 
method is applied to a set of otolith data, used for age determination of fish. 

Key words: Bias correction; Best linear unbiased predictor; Crowder weights; 
Dispersion components; Generalized estimating equation; Nuisance parame- 
ter insensitivity; Pearson estimating function; Residual maximum likelihood; 
Tweedie distribution. 



1 Introduction 



Ever since Nelder and Wedderburn (1972) introduced generalized linear 



models for independent data, there has been a steady development of methods 
for analysis of non-normal correlated data. This development was accelerated 
by the introduction of generalized estimating equations (Liang and Zegerl|1986l) 



and generalized linear mixed models (Schall, 1991; Breslow and Clayton[ |1993| 
Wolfinger and O'Connell 1993). These two types of models differ conceptu- 



ally and computationally as reflected in the conventional distinction between 
marginal and conditional models. 

In practice, however, one is often faced with a combination of longitudinal 
and random effects, where neither of the two, on their own, are adequate. In 
this light, it is somewhat of an enigma why generalized estimating equations 
and generalized linear mixed models have continued to evolve along separate 



paths. With few exceptions (McCulloch and Searle, 2001 Diggle et al. 2002 



Fitzmaurice et al. 2004), the literature is clearly divided into two separate 



strands. Pinheiro and Bates (2000) combined the two approaches for normal 



data. Ziegler et al. (1998) and Hall (2001) summarized the first decade of devel- 



opments for generalized estimating equations and recent contributions include 



for example Hardin and Hilbe 



(2003), Wang and Carey (2004), Coull et al. 



(2006) and Wang and Hanfelt (2007). For generalized linear mixed models, we 



refer to recent monographs, such as Verbeke and Molenberghs (2000), Lee et al. 



(2006) and references therein. 



In the present paper we propose a versatile and computationally efficient 
method for generalized linear longitudinal mixed models. The estimating equa- 
tions used for the regression parameters are similar in style to those known 
from conventional generalized estimating equations, whereas Pearson estimat- 
ing equations are used for the association parameters. The method combines 



2 



longitudinal and random effects models while retaining a marginal as well as a 
conditional interpretation. A serial correlation structure is employed within 
clusters, but unlike the state space models of J0rgensen et al. (1999) and 



J0rgensen and Song (2007), where the latent process is non-stationary, our 



approach is based on a stationary latent process defined by means of a linear 
filter. 

In order to avoid the intricate efficiency considerations associated with con- 
ventional generalized estimating equations, see eg. Wang and Hanfelt (2003), 
we emphasize explicit modelling of the covariance structure based on second- 
moment assumptions for the data-generating process. An example of the utility 
of this approach is the twin data analysis by Iachina et al. (2002), where the 
correlation between the two twins in a pair was estimated on the basis of a gen- 
eralized estimating equations model. Nevertheless, the method allows the use 
of a working correlation structure and we extend the robust empirical sandwich 
estimator to handle the asymptotic variance for both regression and association 
parameters. 

Our estimation of the association parameters compares to that of residual 
maximum likelihood by means of the bias-corrected Pearson estimating equa- 



tions of J0rgensen and Knudsen (2004), following earlier work by McCullagh 



and Tibshirani (1990) and Hall and Severini (1998). 



The estimating equations are solved by an efficient Newton scoring algo- 
rithm, thereby circumventing the problems associated with the multidimen- 
sional integral defining the likelihood in conventional generalized linear mixed 
models approaches such as Schall (1991), Breslow and Clayton (1993) and 



Wolfinger and O'Connell (1993). 



The models covered by our approach are defined hierarchically via condi- 
tional Tweedie distributions, much like the multiplicative mixed effects models 



(2007 


) and 


Ma et al. 


(2009) 



tive mean structure with a corresponding additive decomposition of the variance 
into dispersion components, thereby retaining much of the simplicity of classical 
linear models. The Tweedie family provides a flexible class of models for both 
positive continuous data, count data and positive continuous data with a point 
mass at zero (J0rgensen 1997, Ch. 4), see also J0rgensen and Song (2007). 



2 Model 



2.1 Model specification 

We now introduce the main type of our approach followed by a discussion of 
their covariance structure, which is crucial for the interpretation and estimation 
of the models. The model is based on the class of Tweedie exponential disper- 

Ch. 4). A Tweedie variable Y ~ Tw r (/i, a 2 ) is 
H and var (Y) = a 2 fi r . Special cases include 
the normal (r = 0), Poisson (r = 1 and a 2 = 1) and gamma (r = 2) families. 
The case 1 < r < 2 correspond to compound Poisson distributions, which are 
non-negative and continuous, except for a positive probability at zero. 



sion models (J0rgensen 



1997 



characterized by having E (Y) 
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For a given cluster i and time t consider a response vector Yn with condition- 
ally independent Tweedie distributions. For ease of presentation we consider a 
balanced design with T equidistant observation times common to all / clusters. 
The model is readily adapted to ragged structures. 

The cluster random effect is represented by independent latent Tweedie 
variables, 

Zi ~ Tw ri (1, a 2 ) , (1) 

where r\ > 2 in order to make Zi positive. Given the cluster variable Zi, we 
consider a latent process based on Tweedie noise, 

Z lt \Zi = Zi ~ Tw r2 (a^Zi , a r ^Lo 2 zl~ r ^ , (2) 

where the Zn are assumed conditionally independent given the variables Zi, 
and again r<i > 2. Here a + = Y1T=0 a k < oo with qzq = 1 and G [0, 1) for 
k > . The coefficients determine a conditionally weakly stationary latent 
process Qa, defined by the linear filter 

oo 

Qit = ^^oisZit-s. (3) 

By way of construction of the middle layer of the latent process Qa has mean 
1. At the observation level we assume 

Y it j | Q u = <?« ~ Tw ra (jHtqa, P 2 Qu~ r:i ) > ( 4 ) 

with conditional independence given Qj*. Here the * notation denotes the vector 
obtained by letting the corresponding index run, so that = (Yn, . . . , Yjr) T 
and so on. By definition of the Tweedie variance function we obtain linearity in 
the conditioning variable of the mean and variance, so that E (Ya \ Q^) = QitfHt 
and var (Yu \ Q^) = qup 2 //^- This property is essential for the derivation of the 
covariance structure and the estimating functions below. In the Poisson case 
(r3 = 1) it is convenient to let the dispersion parameter p 2 > accommodate 
potential over- or under-dispersion. 

A variant of the model, applicable to normal response variables, assumes 
normal zero mean random cluster effects, Z; L ~ N(0,a 2 ), replacing ([l]). The 
noise process ^ is then assumed to be Gaussian, Zu \ Zi = z% ~ N{zi,uj 2 ), 
while maintaining the filter ^ as above. At the response level Q is replaced 
by an additive structure with identity link function, Ya \ = ~ N(fiu + 
qu,p 2 )- Since the structure is linear, the corresponding covariance structure is 
easily derived. 

The marginal means may depend on covariates na = Pit ( x it P) , where (3 
denotes a vector of regression parameters. With positive data, the log link is 
a suitable choice, providing a natural interpretation of the regression parame- 
ters. Furthermore the log link, along with the multiplicative structure, enables 
easy comparison with conventional generalized linear mixed models, where the 
random effects enter as a term in the linear predictor: rja = log (puqu) = 
xJtf3 + 1og (q it ). 



4 



2.2 Covariance structure 



The marginal variance-covariance matrix of the observation vector Y„ is 
crucial for the inference and estimation procedures. Detailed derivations are 
found in Appendix 1. 

The covariance between two given observations within the ith cluster is 



cov (Y it , Y it ,) = a 2 flu/Jit' + mtHit' ^2a s a s+ \ t _ t/ \ + S t t 'p 2 p. 



it • 



where 6j' is the Kronecker delta, being 1 for i = i' and zero otherwise. It is 
an important property of the model that the covariance does not depend on r\ 
and T2- From this we now derive a matrix expression for var (Y**). 

First we consider the latent process correlation matrix, K (a), with ii'th 
entry Yl^Lo a s a s+\t-t'\- Next let It denote the T- vector of Is. In matrix 
notation the variance-covariance matrix of the response vector for the ith cluster 
may then be expressed as 



var (Vj*) = © 1t1? + w 2 K (a)} + p 2 diag (p. 



+ diag (lAfr) K (a) diag (^J + p 2 diag (^ 3 ) ; (5) 



say, where is the Hadamard (elementwise) product ( Magnus and Neudecker[ 



1999[ p. 45). 

Similar to conventional linear mixed models, the variance is decomposed into 
components of dispersion corresponding to the different sources of variation. 
The covariance terms in ^ reflect the observation error, the covariance within 
cluster and the variation between clusters. 

The models, accommodated by our approach, hence extend the range of 
possible serial correlation patterns compared with the conventional general- 
ized estimating equations correlation structures usually considered. Particular 
covariance structures may be obtained by imposing restrictions on the linear 
filter parameter vector a or the dispersion parameters. Table [l] lists the more 
common covariance structures and the corresponding parameter restrictions. 



3 Estimation 

3.1 General issues 

The set of parameters to be estimated is naturally partitioned into re- 
gression and association parameters, 6 = (/3 T ,7 T ) , where the regression pa- 
rameters (3 usually are those of interest whereas the association parameters ■j, 
containing dispersion and correlation parameters, are often considered nuisance 
parameters. For estimation of the parameters we use a set of corresponding es- 
timating functions denoted ip = . 

The estimating function for the regression parameters is 

^^^DjCr^Y^EiYi)), (6) 

i=l 
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Table 1: Standard covariance structures. The MA(p)-type and AR,(p)-type refer 
to the latent process correlation structure conditionally on the cluster random 
effects. 



Covariance structure Parameter restrictions 



Independent 


1 


= a 2 = 0. 




Exchangeable 




= = and a s = 


for s > 0. 


MA(p)-type 


a s - 


= for s > p. 




AR(p)-type 


For 


p = 1 a s = a s . For 


p > 1 the a s are given by the 




Yule- Walker equations. 




GLMM 


-I 


= and a s = for s 


> 0. 



where D { = VpE(Yi) = dE(Yi) /d/3 T and d = var(l^). Although Q is 
similar to the well known estimating function for the regression parameters from 



the conventional generalized estimating equations framework (Liang and Zeger 



1986), it corresponds to using the model covariance matrix (|5) rather than the 
working covariance matrix. The conventional generalized estimating equations 
working covariance matrix is built around the working correlation matrix i?(a) 
so that 

var (Yi) = cf>Al /2 R (a) A 1 / 2 (/xj , (7) 

where <j) is a dispersion parameter, A% (fij) = diagjw (/^*)} and v (•) is the vari- 
ance function. In contrast, we emphasize the decomposition of the variance into 
components of dispersion and associate the process correlation matrix K(a) 
with an appropriate level in the hierarchy. 

We use Pearson estimating functions for the estimation of the associa- 
tion parameters 7 = (71, . . . ,7at) t . The entire vector of functions is denoted 



component given by 



where N = 3 + M and M = dim (a) and with the nth 



Vv (A 7) = E tr { w *n {nrJ - Ci) } 



i=i 



where = Yi — E(Yj) and W{ n are suitable weights. This form emphasizes 
the model covariance matrix in contrast to the more conventional expressions 



of the Pearson estimating function (J0rgensen and Knudsen, 2004) 



The estimating functions and ■j/> 7 are explained in more detail below, in 
Section |3.4| and |3.5| respectively. 



3.2 Sensitivity 



Cox and Reid (1987) studied parameter orthogonality in the likelihood 



framework corresponding to block diagonality of the Fisher information ma- 
trix. J0rgensen and Knudsen (2004) studied the corresponding property of 
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nuisance parameter insensitivity in an estimating equation context, by means 
of the sensitivity matrix, denned by Sg = E {Vg^(9)}, where V is the gradient 
operator. The sensitivity matrix may be partitioned into blocks corresponding 
to (t/Zg,'!/^) and (/3 T ,7 T ) as follows: 



Sg 



Sp (0) 
S ~/P (0) 



>/3 7 



(0) 



S 7 (0) 



E{Vp^((3,j)} £{V 7 </> 7 (/3, 7 )} 



Nuisance parameter insensitivity (for short denoted 7-insensitivity) is defined 
by the upper right-hand block Sp^ (9) being zero. First of all this implies effi- 
ciency stable estimation of /3, meaning th at t he estimation of 7 does not affect 



the asymptotic variance of (3; see Section 3.7 Second, it simplifies the Newton 



scoring algorithm (J0rgensen and Knudsen, 2004) as detailed below. Third, it 
implies that (3^ varies only slowly with 7, where /3_, is the estimate of (3 for 
with 7 known. While nuisance parameter-insensitivity does not ensure asymp- 
totic independence of f3 and 7, it does ease the computation of the asymptotic 
variance of (3. 



Following J0rgensen and Knudsen (2004) it is easily seen that tpp is 7- 
insensitive. In fact, from (JgJ) we see that i\)p depends on 7 only via C^ 1 and 
hence V-ytpp (/3,7) has zero mean, i.e. Sp-y (0) = 0. 

In the rest of the paper we write Sp for Sp (0) etc, whenever the mean- 
ing is unambiguous. The remaining blocks of Sg are detailed along with the 



estimating functions in Sections 3.4 and 3.5 



3.3 Algorithm 

Calculation of the parameter estimates is achieved by the Newton scoring 



algorithm (J0rgensen et al. 1996) in which we update the previous value of 
by 



e* = e - s g ^ (e) . 

By the regularity of the 7-insensitivity of tpp, and using simple matrix 
manipulations we may express the inverse of Sg in blocks as follows: 



s- 1 



~Sj SypSp Sj 



The algorithm therefore splits into a (3 step 

(3*=(3-Sp 1 < l Pp(6) 

and a 7 step 

7* = 7 + S-'S^pS^p (6) - S~ V 7 (0) 



(8) 



(9) 



7 



9) 



S^pSp 



V/3 (e)} 



(10) 



Following J0rgensen and Knudsen (2004) we insert (3* from ^ into equation 
(HO ). Since equation (|9|) can be rewritten as —Sp 1 (0) tpp (6) = (3* — f3, this 
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makes S^ 1 (9*) ip 3 (0*) = 0, where 6* indicates (3* is being used. Consequently 
the modified 7 step becomes 



7* = 7 -S-Vv (#*)• 



(11) 



Analogously we use the most recent estimate of 7 when updating (3 i n (|9[). This 
is however of less importance, due to the slow variation of /3 7 with 7. J0rgensen 
and Knudsen (2004) coined the scheme of alternating between ^ and (11) the 
chaser algorithm, with reference to the asymmetric interdependence between 
(3* and 7*. Our experience with the algorithm confirms this property. 



3.4 Regression parameters f3 



Following Ma (1999), Ma et al. (2003) and Ma and J0rgensen (2007) we 



use best linear unbiased predictor for predicting the random effects. The best 
linear unbiased predictor of a random variable Q given the observed data Y is 
defined by flHenderson[ |1975[ [Ma] |1999[ ) 



Q = E{Q) + cov (Q, Y) var (Y)' 1 {Y - E (Y)} 



(12) 



The model specification using Tweedie distributions allows for derivation of the 



joint score function u (6; Y, Q) (Ma and J0rgensen, 2007). We define unbiased 
estimating functions xjjphy substituting the random effects by their respective 
best linear unbiased predictors, i.e. 
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0,Y,Q 



It follows from ( 12 ) and the linearity of E (•) and cov (•, Y) that the best linear 



unbiased predictor of AQ + BY given Y is AQ + BY, where A and B are 
matrices of suitable dimensions. Since u is linear in both the observed and 
the latent variables, Y and Q, we find that ipg is the best linear unbiased 
predictor of the score function u given the data. By ( 12 ) we therefore arrive at 
the conventional generalized estimating equations form expression ^ 



^ (3 = E («) + cov (u, Y) C- 1 {Y-E (Y)} = £ DjC; 1 { Yl - E (Y,)} 



i=i 



(13) 

Here we have used the independence between clusters, along with the following 
Bartlett-type identity 

D = VpE (Y) = E{Y -u) = cov (Y, u) . 

From (joj) we furthermore obtain the sensitivity Sp = E (V ( g'j/' i g) and variability 
Vp = var (ipp) 



as 



D T C ' l D, V» = D T C ' l D. 



(14) 



The identity V p = —S3 is characteristic for quasi-score functions. We therefore 
conclude that ijjp is optimal within the class of linear estimating functions. This 
also follows from (13) as the best linear unbiased predictor is optimal among 



all linear predictors. 



S 



3.5 Association parameters 7 



Our approach is akin to that of Ma and J0rgensen (2007), but deviates 



by allowing for correlation structures within clusters. Furthermore Ma and 
J0rgensen used a closed form ad-hoc estimator for the association parameters 
Our estimation of 7 is based on Pearson estimating functions, following the 



path of Hall and Severini (1998) and J0rgensen and Knudsen (2004). For 7„ it 



is defined by 



V> 7 « (A 7) = E {rlWinn - tr (Wind)} 
i=l 
I 

= {tr {W in r, ir J) - tr (W in Ci)} 
i=i 
1 

= Y,tr{W in (nrj-d)}, 



(15) 



where = Y", — E (Yj) and VFj n are suitable weights. By linearity of £'(■) 
and tr(-) these estimating functions are clearly unbiased since E (fivj^j = Ci. 

In the conventional generalized estimating equations framework the Pearson 
estimating function hinges the estimation of association parameters on a work- 
ing correlation matrix used for defining var(Y"j) as shown in Q. In contrast, 
we emphasize the decomposition of the variance into components of dispersion 
and associates the process correlation matrix K (a) with an appropriate level 
in the hierarchy. 

For Win we use the weights proposed by Hall and Severini (1998), 



Wi 



dc: 



C 



.xdCi 



C7 1 . 



In the normal case these weights lead to quasi-score functions and in general 
they provide estimating functions that resemble the structure of the estimating 
functions (pi) for the regression parameters. 

From (|15[) we may derive the # m -sensitivity of i)J ln , namely 



1 1 sb** 



E 



d 
d9„ 



^tr{W in (nrj-d)} 



i=l 



i=l 



i=i 



Here we have used that the derivatives of r, do not depend on data so E { (dri/d6 m ) rj} 
E{n(drj/d0 m )} = 0. 
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The symmetry between the blocks S~ and S^p is highlighted by the forms 
of the nmth entries 



and 



respectively. 



{ S -yp} 7 



Vtr (c-^C-^ 



(16) 



(17) 



3.6 Bias correction 



The estimation of nuisance parameters may be subject to bias (McCullagh 



and Tibshirani, 1990, J0rgensen and Knudsen, 2004), caused by not taking into 



account the degrees of freedom spent on estimating the regression parameters. 



In the spirit of Godambe (1960), Heyde (1997) and J0rgensen and Knudsen 



(2004) we adjust the estimating function for bias rather than the estimate. The 



corrected estimating function for j n becomes 
VV (A7) = K(A7) + ^ (0,7) 



Y,tr{W in {r t rJ-C t )}+tr\ [^DjW^Di i^DjC^L 



i=l 
I 

E 

i=l 



tr{W, 



Ci)} 



tr (j^> J-i 



vi=l 



(7») 
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dJp/djn. The Godambe information Jp, see Section 3.7, plays 



where J 

a role in the estimating equation context analogous to that of the Fisher infor- 
mation in the likelihood framework, with J^ 1 being the asymptotic variance of 

(3. The penalty term 6 7n (/3, 7) therefore represents the 7-dependency of Jp, 
weighted by the precision of the estimate /3. In this way it corrects for the effect 
upon ip ln ^/3 7 ,7^ of using (3 y . 

We note that b ln (/3,7) = d\og ^ J^ 1 \ /dj n , which may be a more conve- 
nient form in some applications. 

Since & 7n (/3, 7) does not depend on the data, we obtain the 7- and j3- 
sensitivity of i/> 7 (/3, 7) by amending and S^p respectively, with the 7- and 
/3-derivatives, of the penalty term, respectively. For the nmth entries we obtain 

9 -K(Pn) 



and 



d 



M/3, 7 )=tr fj? n V J ^ m V- J 



The derivatives of J/3 are listed in Appendix 2. 



(7n,/3m) ■ 



(18) 



(19) 
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3.7 Godambe information J# 

For joint inference on 6 = (/3 T ,7 T ) T we use the asymptotic property, valid 
under mild regularity conditions 

where J^ 1 = S f) 1 VgSQ T , the inverse Godambe information or the sandwich 



'o 

estimator. 



The structure of the "bread" Sg 1 in the sandwich estimator is (pL with 



blocks listed in (14), (16) and (17). The lower blocks, associated with 7 are 



however amended with terms for bias correction as given by (18) and (19) 



The "meat" part Vq is the variability of ipg and may be structured analo 
gously 

~Vp Vp 1 
V-f/3 V 1 



V, 



where obviously Vp~, = V^p- 

Using (p| and (20) J^ 1 may be written as 



(20) 



j/3 3 M 

J7/3 j 7 

-l 13 -1 

— SjpSp 



Vp 

V^p v. 



7 



sr 1 

7 



s p ly p s p l 
-S^pS^Vp + v. 







7/JJ Sp 1 



_q-lqT q-1 
^7 

-VpS^S^p + V 



s- 1 
s 7 



7/3 

-1 



r 1 

7 



(21) 



where L = S^pSa 



VpS^S^p 



V. 



7/3 



^7/3 Sp 1 Sjp ■ 



The upper left block of 1 shows that the the asymptotic variance of (3 is 
unaffected by the estimation of 7. On the other hand the quantity L in the 
lower right block represents the inflation of the asymptotic variance of 7 caused 
by the estimation of /3. By ( 14 ) Sp = —Vp and therefore the upper right block 
of reduces to S^ 1 (S^p + V^p) S' 1 . If S^p + V^p = then S e = -V g 
and xj) would have been a quasi score. In that sense the S^p + V^p measures 
how much t/> deviates from being quasi-score. 

Except for Vp, the blocks rely on 3rd and 4th moments. For practical 
use this seems less tractable and we will instead employ empirical variabilities 

;T\T 



of -ip = (ipj,^) , defined by V ,Em P = Ei ^i(^i(^) T ■ % plugging in 
Vo : Emp we obtain the empirical sandwich estimator, J^mp = ^^Vo 



T" 1 



S p 1 ^ P,EmpS p 1 



-Vp,EmpSp S^p + V^p Emp J 5 7 



s^ 1 



-S~fpSp 1 Vp,Emp 



V. 



7/3,Emp I &p 



L + V 7; Emp^) S~ 

(22) 



-1 
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where L 



V /3,EmpS ^ 3^(3 



V 



IcT 



7/3,Emp y 

We may replace V^Emp by V p = Sp 1 to obtain the semi-empirical sand- 
wich estimator 



r 1 

'6»,Sem 



d /9 



^7 



> 7 /3 



+ V 



7/3, Emp 



s- 1 

*/3 



^7,3 + ^7/3,Emp 



^7 



^7 



Z + F 



7, Emp 



S^ 1 

s 7 



where now i 



The bias correction 6 7 (7, (3) applied to V7 causes that part of -0 to have a 
non-zero mean value. 



4 Simulation 

Some key properties of our method were addressed by an extensive simu- 
lation study in which 4000 data sets were simulated for each of 36 different 
configurations specified by combinations of the following Tweedie parameters: 
n G {1-5, 2-0, 2-5, 3-0} , r 2 G {1-5, 2-0, 3-0} and r 3 G {1-0, 1-5, 2-0, 3-0}. Table [| 
lists the model parameter settings used for the simulations. These settings are 
repeated across all combination of the latent variables Tweedie parameters r\ 
and r*2. The data sets were simulated with 15 clusters each, and each cluster 
consisting of latent ar(1) time series of length 30 and a log-linear regression for 
the fixed part. The correlation parameter a and the response dispersion param- 
eter p 2 varied with the response model. Except for the Poisson case, for which 
we used an intercept of 4-0, all other intercepts and all slopes were identical 
across scenarios. For other response models than the Poisson, the parameters 
were chosen to attain similar second moments for the marginal response. The 
coefficient of variation of the marginal response, based on these settings, ranged 
from 0-497 (Poisson) to 0-579 (Gamma). 

As ipp is a quasi score estimating function, with well known and optimal 
properties the simulation study naturally focuses on the estimates of the asso- 
ciation parameters. 



Table 2: Parameter settings used for the simulations. 



Response Model 




Reg 


•ression 




Dispersion 


Correlation 


Distribution 




A) 


Pi 


a 2 


OJ 2 


P 2 


a 


Poisson 


1-0 


4-0 


0-3 


0-05 


0-15 


1-0000 


0-40 


Compound Poisson 


1-5 


1-6 


0-3 


0-05 


0-15 


0-1175 


0-55 


Gamma 


2-0 


1-6 


0-3 


0-05 


0-15 


0-0850 


0-50 


Inverse Gaussian 


3-0 


1-6 


0-3 


0-05 


0-15 


0-0200 


0-40 
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Figure 1: Median values of a 2 for bias-corrected (o) and un-corrected (•) esti- 
mation compared with the true value a 2 = 0-05 (horizontal line). 



Robustness 

Simulations with varying configurations of r% and r2 was used for studying 
the assumed robustness against the lack of knowledge about the Tweedie pa- 
rameters driving the latent process. Along the same lines we investigated how 
the model performed across an appropriate range of the Tweedie parameter r% 
for the response variable. 

Figure [l] shows the median values of estimates of a 2 , for all combinations 
of the Tweedie parameter considered. Corresponding plots for w 2 , p 2 and a, 
not included in the article, show similar patterns across the range of r\ values 
considered. Also there seems very little difference in the patterns across the 
range of T2 values for p 2 and a, whereas a 2 and cD 2 show a markedly higher bias 
for T2 = 3 than for r<i = 1-5,2. Our approach hence appear reasonably robust 
against varying specifications of the latent process. The asymptotic variances 



of the estimates of the association parameters (22), enables us to compute es 



timates of coverage probabilities for 95% aymptotic confidence intervals, based 
on these. These coverage probabilities are listed in Table [3} While the coverages 
indicate the standard errors of the regression parameters to be precise, the cov- 
erages are consistently much too big for the association parameters, indicating 
overestimation of their standard errors. 



Bias correction 

The magnitude of the nuisance parameter bias correction is assessed by a 
duplicate analysis of the simulated data sets, except that the second estimation 
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Table 3: Summary of coverages for 95 % confidence intervals across varying 
configurations. 



Statistic 


A) 


Pi 


a 2 


u 2 


P 2 


a 


1st Qu. 


73-54% 


91-62% 


99-95% 


97-55% 


99-12% 


98-91% 


Median 


92-78% 


94-29% 


100-00% 


98-98% 


99-44% 


99-32% 


3rd Qu. 


93-09% 


94-68% 


100-00% 


99-83% 


99-66% 


99-85% 



is done without bias correction. 

Within the range of configurations considered here, there were generally 
little bias on the average of the estimated association parameters. Apart from 
a few minor exceptions, that may well be referred to sampling error, the bias 
correction pulled the estimates closer to their true values. The bottom level 
dispersion parameter a 2 shows markedly the highest correction. 



5 Data Analysis 



Knowledge about the growth of fish is important for the assessment of fish 
biomass. For this purpose, many fisheries management programmes sample 
otoliths on a regular basis. An otolith is a structure located in the inner ear of 
fish and is built by deposit of calcium carbonate, protein and a variety of trace 
elements. It carries information about age and growth patterns, by means 
of alternating opaque and translucent bands. When viewed in transmitted 
light, a translucent band represents a low level of deposition of proteins in the 
calcium carbonate crystal structure corresponding to a period of slow growth 



(Mosegaard and Titus, 1987). Sub-seasonal bands, representing daily cycles, 



can sometimes be identified within the annual bands (Pannella, 1971). 



The data collected and analysed by Clausen et al. (2007) contains mea- 



surements of daily growth bands of otoliths collected from juvenile herrings 
(Clupea harengus). The age in days was determined, by counting bands, for 
each of the sampled specimen and along with the time of sampling they were 
categorized as being offspring from one of three spawner types: autumn, winter 
and spring. These are distinct stock components but mix on the nursery and 
feeding grounds. For stock assessment purposes, it is of interest to be able to 
discriminate between them. Clausen et al. (2007) used otolith characteristics 



for this purpose. 

With each fish being a cluster and the sequence of bands within fish giving 
the longitudinal structure, otoliths measurements are amenable to our frame- 



work. We analysed the data from Clausen et al. (2007) to illustrate the use of 
our model for this type of data. 

For compatibility across the collection of otoliths, the band widths are mea- 
sured along similar radii on all otoliths. If the band marks are not all clearly 
identifiable along this transect, two or more adjacent bands are aggregated and 
the total width of these bands is taken (Fig. [2]). The count of bands between 
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Figure 2: A photo of an otolith with radius and band marks. The trace marks 
for band 18-23 are identified outside the radius and only the span of these 
bands can be measured. HC: Hatch Check. 

two measurement marks is then based on intermediate band marks identified 
elsewhere on the otoliths. It is common practice to use the average of such 
aggregated bands, in place of correct measurements; a feature also found in the 
present data. To avoid successive values obtained from the same aggregation 
of bands, it sufficed to sub-sample every 8th value for our analysis. 

The sampled fish have different ages and therefore display differences in the 
lengths of their band width series. To avoid bias caused by data selection, we 
truncated the sequences of band measurements to the shortest sequence within 
each spawning category. Furthermore the first 10 bands were left out of the 
analysis, as their measurements were considered too imprecise. 

Two variant models were estimated: one fixing the response Tweedie pa- 
rameter r3 = 2, corresponding to a gamma distribution and the other one 
estimating this parameter. Judging from exploratory plots (Fig. [3j) of the ob- 
served width measurements the fixed part could appropriately be modeled as 
2nd order polynomials of the bands and with potential different coefficients for 
the three seasons: width ~ (band+band 2 ) *season using the log link. 

The initial models contained 9 regression parameters. Autumn was chosen 
as base level for the season factor, to enable a direct comparison between au- 
tumn and winter, as these appeared most alike among the the three seasons. 
The models were reduced to final models, with 6 regression parameters, through 
a succession of Walds test and re-estimations. Based on inspection of the auto- 
correlation and the partial auto-correlation function for individual otoliths, an 
ar(1) model was deemed appropriate. Estimates for the parameters of the final 
models and standard errors are listed in Table H 

The two regressions are estimated almost exactly the same, whether r% is 
estimated or assumed known. This reflects the 7 insensitivity of i/^. From 
the fit we conclude that autumn and winter differ only by the 1st order term 
whereas autumn and spring differ by all three terms. The fitted curves from 
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Figure 3: Width measurements at every 8th band. Thick white line indicates 
average measurements over otoliths. Top row: observed, bottom row: esti- 
mated. 

Table 4: Parameter estimates, standard errors (SE) and p-values for both fixed 
and estimated r% models. * The p- value for r% applies to the hypothesis Ho : 
r3 = 2. aut: autumn; win: winter; spr: spring. 



Parameter 




Fixed r% = 


2 






Estimated r% 




Est 


SE 




p- value 


Est 


SE 


p- value 


/^aut+win 


0-3118 


0-0292 




< 0-0001 


0-3113 


0-0292 


< 0-0001 


/^spr 


1-1304 


0-0553 




< 0-0001 


1-1288 


0-0556 


< 0-0001 


/^spr :band 


0-0187 


0-0016 




< 0-0001 


0-0187 


0-0016 


< 0-0001 


/3win:band 


0-0032 


0-0003 




< 0-0001 


0-0032 


0-0003 


< 0-0001 


/^(aut+win) :band 2 


1-9 x 10- 


5 1-4 x 10- 


6 


< 0-0001 


1-9 x 10- 5 


1-4 x 10- 6 


< 0-0001 


/-^spr :band 2 


-0-0001 


9-9 x 10- 


6 


0-0001 


-0-0001 


1-0 x 10~ 5 


< 0-0001 


a 2 


0-0040 


0-1639 




0-9807 


0-0042 


0-1627 


0-9796 


CO 2 


0-0277 


2-6770 




0-9918 


0-0275 


3-4696 


0-9937 


P 2 


0-0153 


4-1817 




0-9971 


0-0115 


7-0517 


0-9987 


a 


0-8321 


0-0823 




< 0-0001 


0-8325 


6-7673 


0-9021 












2-2700 


1-7627 


0-8783* 
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the fixed-r3 model are plotted in Fig. [3j 

The association parameter estimates were very similar for the two models, 
but giving quite different standard errors. The rather large standard errors for 
the dispersion parameter estimates confirms the impression from the simulations 
in Section |4| that the use of empirical variabilities in the sandwich estimator 
(22) may lead to standard errors too big to be of any practical use. This seems 
to be the cost of avoiding the use of higher moments . The standard errors for 
the correlation parameter a in the fixed-r3 case appears to be more realistic but 
raises dramatically when r% is estimated. In that case the standard error for 
r3 seems moderate. The a parameter applies on the scale of the sub-sampling 
frequency. A back calculation to a day-to-day serial correlation and assuming 
the ar(1) model, leads to a value of about 0-97. 



6 Discussion 

There are a few useful extensions of the method worth mentioning here. 
It would be straightforward to extend the model to multiple levels of random 
effects, adding further levels by repeated use of conditional Tweedie distribu- 
tions. A second useful extension would be to allow regression modelling for the 



association parameters, along the lines of Davidian and Carroll (1987). 

Finally, the model can be easily adapted to handle multivariate data. By 
use of conditionally independent Poisson response variables, the model can 
be extended to binomial or categorical data, leading to beta-binomial-like or 
Dirichlet-multinomial-like models. This topic is addressed in a companion ar- 
ticle currently in preparation. 
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Appendix 1 

Covariance structure 

Prom the model specification Q-Q we derive the the marginal covariance 
between two observations within the ith cluster. This is done in three steps by 
means of the law total of variance. From E {Z) i = 1 and var (Zj) = a 2 we first 
get 

cov (Zu, Z it i) = E {cov (Zit, Z it , | Z*)} + cov (E (Z it | Z*) , E (Z it > | Z*)) 
= 8%E {uj 2 Zi) + a^ 2 var (Z) 
= dj'uj 2 + a+ 2 a 2 , 

from which we obtain 

oo oo 

cov (Q it , Q iV ) = ^2 ^2 asOLs ' cov Z it'-s>) 

s=0 s'=0 

oo 

= ^ /J «sa s +|t-t'| + cr 2 , 

and finally arrive at 

cov (y itj ^) = £ {cov (Ytt, Y w | Z)} + cov {£ (Y tt | Z) , E (Y Vt , | Z)} 
= <5fvar (Yit) + cov (Q it , Q w ) 



Appendix 2 

Process Correlation Matrix .K" (a) for ma(?) and ar(l) processes 

The latent process linear filter Qa = Yl%=Q a s Z it-si induces the process 
correlation matrix K (a), with it'th entry {K (at)} tt , = Yl^Lo a s a s+\t-t'\- 

An MA(q) process is given by ao = 1 and a s = for s > q and has first and 
second derivative matrices with tt'th entries given by 

jg^-^(a) j = otk+\t-t'\h<q-\t-t'\ + o:k-\t-t'\h>\t-t'\ 



and 

d 2 



-K{a)\ =5fC + C"t? 



respectively. Here 8% , 5k< q -\t-t'\ etc are variant forms of the Kronecker delta 
with obvious definitions. 
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An ar(1) process is given by a s = a s ; a £ (0, 1), from which we get the 
tt'th entry of K (a) 

oo oo \t—t'\ 

{K(a)} tt , = ^a- + M'l = = 

s=0 s=0 

Consequently the fcth sub- and super-diagonal of j^K (a) and jj^pK (a) have 
elements 

d ( a k \ (2 - k) a k+1 + ka^ 1 



da\l-a 2 )~ (1-a 2 ) 2 
and 

d 2 ( a k \ _ (k 2 -5k + 6) a( fc + 2 ) + (-2k 2 + 6k + 2)a k + (k- 1) ka^^ 



da 2 \l-a 2 J " (1-a 2 ) 3 
respectively. 

A Appendix 3 

Derivatives of Jp 

The derivatives of Jp involved in calculating the 7- and f3- sensitivities of 

^7 (A 7) are 

/ 

r(7n) 

'i 



jp =~Y, D i WmDi 

i=l 

I 

i=i 
1 

j(7n,7 m ) = J2dJ (w im c t w m + W m C t W tm - c- l c^ lm) c^) Di 

i=l 

I 

J$ M = Y, Df m)T W in D l + DfW in Df m) + DjW ( ^ m) A, 



i=i 

where W m = -^C- 1 = C" 1 (^C t ) C" 1 and = etc. 
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